Towards Semantic Web Information Extraction

نویسندگان

  • Borislav Popov
  • Atanas Kiryakov
  • Dimitar Manov
  • Angel Kirilov
  • Damyan Ognyanoff
  • Miroslav Goranov
چکیده

The approach towards Semantic Web Information Extraction (IE) presented here is implemented in KIM – a platform for semantic indexing, annotation, and retrieval. It combines IE based on the mature text engineering platform (GATE1) with Semantic Web-compliant knowledge representation and management. The cornerstone is automatic generation of named-entity (NE) annotations with class and instance references to a semantic repository. Simplistic upper-level ontology, providing detailed coverage of the most popular entity types (Person, Organization, Location, etc.; more than 250 classes) is designed and used. A knowledge base (KB) with de-facto exhaustive coverage of real-world entities of general importance is maintained, used, and constantly enriched. Extensions of the ontology and KB take care of handling all the lexical resources used for IE, most notable, instead of gazetteer lists, aliases of specific entities are kept together with them in the KB. A Semantic Gazetteer uses the KB to generate lookup annotations. Ontologyaware pattern-matching grammars allow precise class information to be handled via rules at the optimal level of generality. The grammars are used to recognize NE, with class and instance information referring to the KIM ontology and KB. Recognition of identity relations between the entities is used to unify their references to the KB. Based on the recognized NE, template relation construction is performed via grammar rules. As a result of the latter, the KB is being enriched with the recognized relations between entities. At the final phase of the IE process, previously unknown aliases and entities are being added to the KB with their specific types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Towards Cross-Media Feature Extraction

In this paper we describe past and present work dealing with the use of textual resources, out of which semantic information can be extracted in order to provide for semantic annotation and indexing of associated image or video material. Since the emergence of semantic web technologies and resources, entities, relations and events extracted from textual resources by means of Information Extract...

متن کامل

French-Written Event Extraction Based on Contextual Exploration

Event extraction is a significant task in information extraction. This importance increases more and more with the explosion of textual data available on the Web, the appearance of Web 2.0 and the tendency towards the Semantic Web. Thus, we propose a generic approach to extract events from text and to analyze them. We propose an event extraction algorithm with a polynomial complexity O(n), and ...

متن کامل

Towards Knowledge Acquisition from Information Extraction

In our research to use information extraction to help populate the semantic web, we have encountered significant obstacles to interoperability between the technologies. We believe these obstacles to be endemic to the basic paradigms, and not quirks of the specific implementations we have worked with. In particular, we identify five dimensions of interoperability that must be addressed to succes...

متن کامل

An Overview of the Semantic Web Improving Web Data Accessibility and Performance

The Internet has known a very fast evolution, going from the Web 1.0, i.e., the traditional Web where users are merely consumers of static information, to the more dynamic Web 2.0, known as the Social or Collaborative Web, where users produce and consume information simultaneously, and heading toward the more sophisticated and eagerly anticipated Web 3.0, better known as the Semantic Web: exten...

متن کامل

Towards the semantic web in e-tourism: can annotation do the trick?

Semantic Web technology may support more advanced E-Commerce. Namely the representation of products and services in the form of ontologies will simplify the automated extraction and processing of explicit information and will make implicit information available for the discovery and comparison of offerings. One common assumption is that the Semantic Web can be made a reality by gradually augmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003